open image
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Asia > Nepal (0.04)
- Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.72)
- Transportation > Ground > Road (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Security & Privacy (0.92)
Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping
Nwatu, Joan, Bai, Longju, Ignat, Oana, Mihalcea, Rada
Culture shapes the objects people use and for what purposes, yet mainstream Vision-Language (VL) datasets frequently exhibit cultural biases, disproportionately favoring higher-income, Western contexts. This imbalance reduces model generalizability and perpetuates performance disparities, especially impacting lower-income and non-Western communities. To address these disparities, we propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts. We implement this framework by creating the Culture Affordance Atlas, a re-annotated and culturally grounded restructuring of the Dollar Street dataset spanning 46 functions and 288 objects publicly available at https://lit.eecs.umich.edu/CultureAffordance-Atlas/index.html. Through extensive empirical analyses using the CLIP model, we demonstrate that function-centric labels substantially reduce socioeconomic performance gaps between high- and low-income groups by a median of 6 pp (statistically significant), improving model effectiveness for lower-income contexts. Furthermore, our analyses reveals numerous culturally essential objects that are frequently overlooked in prominent VL datasets. Our contributions offer a scalable pathway toward building inclusive VL datasets and equitable AI systems.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Supplementary Materials A Extended Related Work (2)
We first discuss attacks that use physical objects as triggers, then discuss a few related works which use light as a trigger. We conclude by discussing the single proposed defense against physical backdoor attacks. As mentioned briefly in 2, [ 10 ] designs a backdoor attack against lane detection systems for autonomous vehicles. This attack expands the scope of physical backdoor attacks by attacking detection rather than classification models. Furthermore, it confirms the result from [ 43 ] that even when digitally altered images are used to poison a dataset, the triggers can be activated using physical objects (traffic cones in this setting) in real world scenarios. A second work [ 31 ] evaluates the effectiveness of using facial characteristics as backdoor triggers.
Finding Naturally Occurring Physical Backdoors in Image Datasets Emily Wenger University of Chicago Roma Bhattacharjee
Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using "digital trigger patterns." In contrast, "physical backdoors" use physical objects as triggers, have only recently been identified, and are qualitatively different enough to resist most defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large datasets containing real images of physical objects co-located with misclassification targets . Building these datasets is time-and labor-intensive. This work seeks to address the challenge of accessibility for research on physical backdoor attacks.
- North America > United States > Illinois > Cook County > Chicago (0.41)
- Asia > Nepal (0.04)
- Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.92)
- Transportation > Ground > Road (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Security & Privacy (0.92)
PAC Bench: Do Foundation Models Understand Prerequisites for Executing Manipulation Policies?
Gundawar, Atharva, Sagar, Som, Senanayake, Ransalu
Vision-Language Models (VLMs) are increasingly pivotal for generalist robot manipulation, enabling tasks such as physical reasoning, policy generation, and failure detection. However, their proficiency in these high-level applications often assumes a deep understanding of low-level physical prerequisites, a capability that remains largely unverified. For robots to perform actions reliably, they must comprehend intrinsic object properties (e.g., material, weight), action affordances (e.g., graspable, stackable), and physical constraints (e.g., stability, reachability, or an object's state, such as being closed). Despite the widespread use of VLMs in manipulation tasks, we argue that off-the-shelf models may lack this granular, physically grounded understanding, as such prerequisites are often overlooked during training. To address this critical gap, we introduce PAC Bench, a comprehensive benchmark designed to systematically evaluate VLMs on their understanding of core Properties, Affordances, and Constraints (PAC) from a task executability perspective. PAC Bench features a diverse dataset with over 30,000 annotations, comprising 673 real-world images (115 object classes, 15 property types, and 1 to 3 affordances defined per class), 100 real-world humanoid-view scenarios, and 120 unique simulated constraint scenarios across four tasks. Our evaluations reveal significant gaps in the ability of current VLMs to grasp fundamental physical concepts, highlighting limitations in their suitability for reliable robot manipulation and pointing to key areas for targeted research. PAC Bench also serves as a standardized benchmark for rigorously evaluating physical reasoning in VLMs and guiding the development of more robust, physically grounded models for robotic applications. Project Page: https://pacbench.github.io/
- North America > United States > Arizona (0.04)
- Africa > Mozambique > Gaza Province > Xai-Xai (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
How to classify photos in 600 classes using nine million Open Images
If you're looking build an image classifier but need training data, look no further than Google Open Images. This massive image dataset contains over 30 million images and 15 million bounding boxes. Plus, Open Images is much more open and accessible than certain other image datasets at this scale. For example, ImageNet has restrictive licensing. However, it's not easy for developers on single machines to sift through that much data.You need to download and process multiple metadata files, and roll their own storage space (or apply for access to a Google Cloud bucket).
- Information Technology > Services (0.37)
- Leisure & Entertainment (0.33)
nocaps: novel object captioning at scale
Agrawal, Harsh, Desai, Karan, Chen, Xinlei, Jain, Rishabh, Batra, Dhruv, Parikh, Devi, Lee, Stefan, Anderson, Peter
Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed 'nocaps', for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets. The associated training data consists of COCO image-caption pairs, plus Open Images image-level labels and object bounding boxes. Since Open Images contains many more classes than COCO, more than 500 object classes seen in test images have no training captions (hence, nocaps). We evaluate several existing approaches to novel object captioning on our challenging benchmark. In automatic evaluations these approaches show modest improvements over a strong baseline trained only on image-caption data. However, even when using ground-truth object detections, the results are significantly weaker than our human baseline - indicating substantial room for improvement.
- Transportation > Ground > Road (1.00)
- Leisure & Entertainment > Sports (1.00)
- Government (0.68)
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World
Shankar, Shreya, Halpern, Yoni, Breck, Eric, Atwood, James, Wilson, Jimbo, Sculley, D.
Modern machine learning systems such as image classifiers rely heavily on large scale data sets for training. Such data sets are costly to create, thus in practice a small number of freely available, open source data sets are widely used. We suggest that examining the geo-diversity of open data sets is critical before adopting a data set for use cases in the developing world. We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales. These results emphasize the need to ensure geo-representation when constructing data sets for use in the developing world.
Deep Dive into Object Detection with Open Images, using Tensorflow - Algorithmia Blog
The new Open Images dataset gives us everything we need to train computer vision models, and just happens to be perfect for a demo! Tensorflow's Object Detection API and its ability to handle large volumes of data make it a perfect choice, so let's jump right in… Open Images is a dataset created by Google that has a significant number of freely licensed annotated images. Initially it contained only classification annotations, or in simpler terms it had labels that described what, but not where. After a major version update to 2.0, more annotations were added – of particular importance were the introduction of object detection annotations. These new annotations not only described what was in a picture, but where it was located, by defining the bounding box (bbox) coordinates for specific objects in an image.